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(54) A reconfigurable signal processor with embedded flash memory device 



(57) The present invention relates to a dynamically 
reconfigurable processing unit (1) including an embed- 
ded Flash memory device (3) for non-volatile storage of 
code, data and bit-streams, the unit (1 ) being integrated 
into a single chip together with a microprocessor (2) 



core. Advantageously, the processing unit further com- 
prises an S-RAM based embedded FPGA unit struc- 
tured for FPGA reconfigurations having a specific pro- 
gramming interface (7) connected to a port (FP) of said 
Flash memory device (4) through a DMA channel (8). 
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Description 

Field of invention 

[0001] The present invention relates to a dynamically 
reconfigurable processing unit tightly connected to a 
Flash EEPROM memory subsystem. 
[0002] More specifically, the invention relates to 
reconfigurable signal processing IC with an embedded 
Flash memory device for non-volatile storage of code, 
data and bit-streams, the unit being integrated into a sin- 
gle chip together with a microprocessor core. 

Prior art 

[0003] As is well know by those skilled in this technical 
field, increasing complexity of system design and short- 
er time-to-market requirements are leading research to- 
wards the investigation of hybrid systems including 
processors enhanced by programmable logic. 
[0004] In this respect, reference is made to the work 
by Young-Don Bae et al., "A Single-Chip Programmable 
Platform Base on A Multithreaded Processor and Con- 
figurable Logic Clusters", ISSCC 2002 Digest of Tech- 
nical Papers, pp 336-337, Feb. 2002. 
[0005] Moreover, a further reference may be consid- 
ered the article by Zhang et al., having title: "A 1 V Het- 
erogeneous Reconfigurable Processor IC for Baseband 
Wireless Applications", ISSCC 2000 Digest of Technical 
Papers, pp 68-69,488, Feb. 2000. 
[0006] At the same time raising costs of mask sets 
and shorter time-to-market available for new products, 
are leading to the introduction of systems with a higher 
degree of programmability and configurability, such as 
system-on-chip with configurable processors, embed- 
ded FPGA and embedded flash memory. 
[0007] Moreover, the availability of an advanced em- 
bedded flash technology, based on NOR architecture, 
together with innovative IP's, like embedded flash mac- 
rocells with special features, is a key factor. 
[0008] For a better understanding of the present in- 
vention reference is also made to the Field Programma- 
ble Gate Array (FPGA) technology combining standard 
processors with embedded FPGA devices. 
[0009] These solutions allows to configure into the 
FPGA at deployment time exactly the required periph- 
erals, exploiting temporal re-use by dynamically recon- 
figuring the instruction-set at run time based on the cur- 
rently executed algorithm. 

[0010] The existing models for designing FPGA/proc- 
essor interaction can be grouped in two main catego- 
ries: 

the FPGA is a co-processor communicating with the 
main processor through a system bus or a specific 
I/O channel; 

the FPGA is described as afunction unit of the proc- 



essor pipeline. 

[0011] The first group includes the GARP processor, 
known from the article by T. Callahan, J. Hauser, and J. 

s Wawrzynek having title: "The Garp architecture and C 
compiler" IEEE Computer, 33(4) : 62-69, April 2000. A 
similar architecture is provided by the A-EPIC processor 
that is disclosed in the article by S. Palem and S. Talla 
having title: "Adaptive explicit parallel instruction com- 

'0 puting", Proceedings of the fourth Australasian Compu- 
ter Architecture Conference (ACOAC), January 2001 . 
[0012] In both cases the FPGA is addressed via ded- 
icated instructions, moving data explicitly to and from 
the processor. Control hardware is kept to a minimum, 

'5 since no interlocks are needed to avoid hazards, but a 
significant overhead in clock cycles is required to imple- 
ment communication. 

[0013] Only when the number of cycles per execution 
of the FPGA is relatively high, the communication over- 
do head may be considered negligible. 

[0014] In the commercial world, FPGA suppliers such 
as Altera Corporation offer digital architectures based 
on the US Patent No. 5,968,161 to T.J. Southgate, "FP- 
GA based configurable CPU additionally including sec- 
25 ond programmable section for implementation of cus- 
tom hardware support". 

[0015] Other suppliers (Xilinx, Triscend) offer chips 
containing a processor embedded on the same silicon 
IC with embedded FPGA logic. See for instance the US 
30 Patent 6,467,009 to S.P. Winegarden et al., "Configura- 
ble Processor System Unit", assigned to Triscend Cor- 
poration. 

[0016] However, those chips are generally loosely 
coupled by a high speed dedicated bus, performing as 

35 two separate execution units rather than being merged 
in a single architectural entity. In this manner the FPGA 
does not have direct access to the processor memory 
subsystem, which is one of the strengths of academic 
approaches outlined above. 

40 [0017] In the second category (FPGA as a function 
unit) we find architectures commercially known as: "PR- 
ISC"; "Chimaera" and "ConCISe". 
[0018] In all these models, data are read and written 
directly on the processor register file minimizing over- 

45 head due to communication. In most cases, to minimize 
control logic and hazard handling and to fit in the proc- 
essor pipeline stages, the FPGA is limited to combina- 
torial logic only, thus severely limiting the performance 
boost that can be achieved. 

so [0019] These solutions represent a significant step to- 
ward a low-overhead interface between the two entities. 
Nevertheless, due to the granularity of FPGA operations 
and its hardware oriented structure, their approach is 
still very coarsegrained, reducing the possible resource 

55 usage parallelism and again including hardware issues 
not familiar nor friendly to software compilation tools and 
algorithm developers. 

[0020] Thus, a relevant drawback in this approach is 
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often the memory data access bottleneck that often forc- 
es long stalls on the FPGA device in order to fetch on 
the shared registers enough data to justify its activation. 
[0021 ] The technical problem of the present invention 
is that of providing a new kind of reconfigurable process- 
ing unit tightly connected to a memory architecture hav- 
ing functional and structural features capable to offer 
significant performance and energy consumption en- 
hancements with respect to a traditional signal process- 
ing device. 

Summary of invention 

[0022] The invention overcomes the limitations of 
similar preceding architectures relying on an embedded 
device of different nature, and a new approach to proc- 
essor/memory interface, 

[0023] According to a first embodiment of the present 
invention, the reconfigurable processing unit targets im- 
age-voice processing and recognition application do- 
mains by joining a configurable and extensible proces- 
sor core and an SRAM-based embedded FPGA. 
[0024] More specifically, the processing unit accord- 
ing to the invention further includes an S-RAM based 
embedded FPGA unit structured for FPGA reconfigura- 
tions having a specific programming interface connect- 
ed to a port FA of said Flash memory device through a 
DMA channel. 

[0025] The features and advantages of the process- 
ing unit according to this invention will become apparent 
from the following description of a best mode for carrying 
out the invention given by way of non-limiting example 
with reference to the enclosed drawings. 

Brief description of the drawings . 

[0026] 

Figure 1 is a block diagram of a processing unit ar- 
chitecture for data processing according to the 
present invention; 

Figure 2 is a block diagram of a Flash memory ar- 
ch itecture embedded into the processing unit of Fig- 
ure 1; 

Figure 3 is a schematic view of system memory hi- 
erarchy provided by the present invention; 

Figure 4 is a block diagram of a specific processor 
extension, for instance added DSP instructions ex- 
amples; 

Figure 5 is a block diagram of a further specific proc- 
essor extension, for instance an optimized fixed- 
point calculation of the square root accounts; 

Figure 6 is a table view showing the overall perform- 



ance improvements for a face recognition task im- 
plemented by the processing unit of the present in- 
vention; 

5 Figure 7 is a schematic chip micrograph. 
Detailed description 

[0027] With reference to the drawings views, gener- 

10 ally shown at 1 is a processing unit realized according 
to the present invention for digital signal processing 
based on reconfigurable computing. 
[0028] The processing unit 1 includes an embedded 
Flash memory device 4 for non-volatile storage of code, 

'5 data and bit-streams and a further S-RAM based em- 
bedded FPGA unit 3 realized for the configuration pur- 
poses of the present invention. 
[0029] More specifically, a 8Mb application-specific 
embedded flash memory device 4 is disclosed. The 

20 memory device 4 is integrated into a single chip together 
with a microprocessor 2 and the FPGA structure 3. 
[0030] Advantageously, application-specific hard- 
ware units are added and dynamically modified by the 
embedded FPGA 3 reconfiguration. By implementing 

25 application-specific vector processing instructions the 
processing unit 1 shows a peak computing power of 
1GOPS. J 

[0031 ] Efficient read-write-erase access to code, data 
and FPGA bitstreams is provided by the Flash memory 

30 device 4 based on a modular 8Mb, 4-bank Flash mem- 
ory, as will be more clearly explained hereinafter. 
[0032] The processing unit 1 comprises three con- 
tent-specific I/O ports and delivers an aggregate peak 
read throughput of 1 ,2GB/s. 

35 [0033] The system architecture 1 is illustrated in Fig- 
ure 1 . 

[0034] The functional purposes of the embedded FP- 
GA 3 are: 

40 i) extension of the processor datapath supporting a 
set of additional special-purpose C-callable micro- 
processor instructions; 

ii) bus-mapped coprocessors, connected.to the sys- 
45 tern bus through a master/ slave interface; 

iii) flexible I/O to connect external units or sensors 
with application-specific communication protocols. 

so [0035] Even though such different circuit purposes 
would require different kinds of programmable logic for 
best implementation of either arithmetic-dominated or 
control-dominated logic, a single programmable logic 
subsystem 3 has been implemented to be shared 

55 among different purposes both in space (same config- 
uration) and time (subsequent configurations). 
[0036] The single, high I / O count, fine-grain e-FPGA 
3 operates as a datapath for the microprocessor pipeline 
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and as dedicated control logic for bus coprocessor and 
I/O control interface. The FPGA has a specific program- 
ming interface 7 connected to a port FP of said Flash 
memory device 4 through a DMA channel 8. 
[0037] FPGA reconfiguration is concurrent to soft- 
ware execution. 

[0038] A local bus 6 connects a dedicated 32-bit Flash 
memory port FP to the FPGA programming interface 7. 
[0039] A DMA channel 8 handles the bitstream trans- 
fer while microprocessor fetches instructions and data 
from different Flash memory ports: 64-bit wide code port 
(CP) and data port (DP). 

[0040] To support streaming applications a 1kB dual- 
port buffer 9 is used to interface fast decoding hardware 
and slower software running on the processor 2. 
[0041] The memory sub-system architecture is shown 
in Figure 2. 

[0042] The modular structure of the memory (dotted 
line) includes: 

charge pumps 1 0 (Power Block); 

testability circuits 11 (DFT); 

- a power management arbiter 12 (PMA); and, 

a customizable array 1 3 of N independent 2Mb flash 
memory modules 16. 

[0043] Depending on the storage requirements the 
number N may be chosen; N=4 in the current implemen- 
tation. 

[0044] The modular memory features (N+2) 128-bit 
target ports and implements a N-bank uniform memory 
13. 

[0045] As previously mentioned, three content-specif- 
ic ports are dedicated to code (CP, 64-bit wide), data 
(DP, 64-bit) and FPGA bit stream configurations (FP, 
32-bit). A 128 bit sub-system crossbar 15 connects all 
the architecture blocks and the eight bit microprocessor 
2. 

[0046] The main features of such the flash memory 
device 4 are: charge pump 10 sharing among different 
flash memory modules 16 through the PMA arbiter 12 
in a multi-bank fashion. Moreover, the use of a small 
eight bit micro processor 2 to easy memory system test 
and to add complex functionalities for data manage- 
ment, and the use of an ADC (Analog-to-Digital Con- 
verter), required by the application, to increase system 
self test capability. 

[0047] The third FP port of the Flash device 4 is ded- 
icated to manage embedded-FPGA (e-FPGA) configu- 
rations data stored in flash memory modules. The FP 
port is read-only and provides fast sequential access for 
bit streams downloading. 

[0048] The FP has four configuration registers repli- 
cating the information stored in CP port that must be 
used in order to write e-FPGA configurations data. 



[0049] The output data word bus and the address bus 
are 32 bits wide. The FP port uses a chip select to ac- 
cess in the addressable memory space, and a burst en- 
able to allow burst serial access. 

s [0050] In read operation, an output ready signal is tied 
low when data are not immediately available, so that it 
can acts as a wait state signal. 
[0051] The eight-bit microprocessor 2 (uP) performs 
additional complex functions (defragmentation, com- 

10 pression, virtual erase, etc.) not natively supported by 
the DP port, and assists for built-in self test of the mem- 
ory system. 

[0052] The (N+2)x4 1 28-bit crossbar 1 5 connects the 
modular memory with the four initiators (CP, DP, FPand 

'5 uP) providing that at least three flash memory modules 
1 6 can be read in parallel at full speed. 
[0053] The memory space of the four modules 16 is 
arranged in three programmable user-defined parti- 
tions, each one devoted to a port. The memory system 

20 clock can run up to 100MHz, and reading three modules 
16 with 128bit data bus and 40ns access time, results 
in a peak read throughput of 1 .2GB/S. 
[0054] Each 2Mb flash memory module 16 has a 
128-bit IO data bus with 40ns access time, resulting in 

25 400Mbyte/s, and a program/erase control unit. Simulta- 
neous memory operations use the power management 
arbiter 12 (PMA) for optimal scheduling. 
[0055] Available power and user-defined priorities are 
considered to schedule conflicting resource requests in 

30 a single clock cycle. 

[0056] The memory device 4 allows up to four simul- 
taneous operations, with a limit of one both for write and 
erase. 

[0057] Figure 3 depicts the memory hierarchy and 
35 parallelism across the processing unit 1 . The ports CP 
and DP are interfaced to the 64-bit, 800MB/s AHB sys- 
tem bus 6. 

[0058] At a system clock rate of 1 00MHz each I/O port 
can independently operate at maximum speed. So, an 

40 aggregate peak read rate of 1 .2GB/s can be sustained 
as it is limited by memory access time. 
[0059] In the current implementation the e-FPGA 
reconfiguration takes 500u.s at 100 MHz. 50MB/s aver- 
age throughput out of the available 400MB/s are cur- 

45 rently sustained by the e-FPGA configuration interface 
7. 

[0060] System performance is being evaluated for an 
image processing application (facial recognition) and a 
speech recognition application. 
so [0061] More than 20 specific instructions were de- 
signed as C/assembly-callable functions, automatically 
translated to RTL, then synthesized and mapped to the 
e-FPGA. 

[0062] Figures 4 and 5 show two examples of specific 
55 microprocessor extensions. 

[0063] Figure 4 relates to an eight-issue, eight-bit, L2 
calculation accounts for 23 eight-bit arithmetic opera- 
tions and six 64-bit operations requiring about 10k ASIC 
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equivalent gates. 

[0064] Figures 5 relates to a datapath for an optimized 
fixed-point calculation of the square root accounts for 
twelve 32-bit operations for about 2k ASIC equivalent 
gates. 

[0065] The overall performance improvements for the 
face recognition tasks are shown in the table of Figure 6. 
[0066] Execution time is compared for 32-bit RISC 
with basic DSP extensions (MAC. zero-overhead loops, 
etc) and the same processor enhanced with application- 
specific instructions. 

[0067] Measured speed-ups range from 1 .8x to 1 0.6x 
(on the most-demanding task), with an overall improve- 
ment of 8.5x. It must be noticed that switching between 
algorithm stages requires only one reconfiguration of 
the e-FPGA. Reconfiguration time is negligible. 
[0068] The speed-up factors take into account the 
possible multi-cycle clock penalty due to processor-FP- 
GA synchronization in case of. instruction extensions 
slower than the processor clock. Energy efficiency fig- 
ures are reported in Figure 6 too. 
[0069] As the average power consumption of the sys- 
tem extended with the e-FPGA is slightly higher 
(10-1 5%), the energy reduction for executing each of the 
tasks on its specific HW configuration (power-delay 
product improvement) results in an overall reduction of 
6.7x. 

[0070] Only one task showed slightly worse total ex- 
ecution energy, though showing benefits on execution 
speed. 

[0071] Last column of Figure 6 reports the energy-de- 
lay improvement of each specific HW configuration 
compared to the general-purpose counterpart. Energy 
required for e-FPGA reconfiguration is always negligi- 
ble. 

[0072] Measurements show the best energy efficien- 
cy in the range of several MOPS/mW at 1 .8V supply. It 
lies between conventional ASIP/DSP and dedicated 
configurable hardware implementations. 
[0073] The full-processing unit on a single chip is im- 
plemented in a 0.18u.m, 2PL-6ML CMOS embedded 
Flash technology, chip area is 70mm 2 , technology and 
device characteristics are summarized in Figure 6 while 
a chip micrograph is shown in Figure. 7. 



Claims 

1 . A dynamically reconfigurable processing unit (1 ) in- 
cluding an embedded Flash memory device (3) for 
non-volatile storage of code, data and bit-streams, 
the unit (1) being integrated into a single chip to- 
gether with a microprocessor (2) core, further com- 
prising an S-RAM based embedded FPGA unit 
structured for FPGA reconfigurations having a spe- 
cific programming interface (7) connected to a port 
(FA) of said Flash memory device (4) through a 
DMA channel (8). 



2. A dynamically reconfigurable processing unit ac- 
cording to claim 1 , wherein said DMA channel (8) 
handles the bitstream transfer while said microproc- 
essor (2) fetches instructions and data from differ- 

s ent Flash memory ports of said Flash memory de- 
vice (4); a wide code port (CP) and a data port (DP). 

3. A dynamically reconfigurable processing unit ac- 
cording to claim 2, wherein said Flash memory de- 

10 vice (4) includes a modular array structure (13) 
comprising N memory blocks (16), and wherein a 
power block (1 0), including charge pumps, is 
shared among different flash memory modules (1 6) 
through a PMA arbiter (12) in a multi-bank fashion. 

15 

4. A dynamically reconfigurable processing unit ac- 
cording to claim 1 , wherein said embedded FPGA 
unit (3) exploits the following functions: 

20 iv) extension of the processor datapath sup- 

porting a set of additional special-purpose C- 
callable microprocessor instructions; 

v) bus-mapped coprocessors, connected to the 
25 system bus through a master/ slave interface; 

vi) flexible I/O to connect external units or sen- 
sors with application-specific communication 
protocols. 

5. A dynamically reconfigurable processing unit ac- 
cording to claim 2, wherein said Flash memory de- 
vice (4) includes at least three different access 
ports, each for a specific function: 

35 • 

said code port (CP) optimized for random ac- 
cess time and the application system; 

- said data port (DP) allowing an easy way to ac- 
40 cess and modify application data; and, 

- said FPGA port (FP) offering a serial access for 
a fast download of bit streams for an embedded 
FPGA (e-FPGA) configurations. 

45 

6. A dynamically reconfigurable processing unit ac- 
cording to claim 2, wherein said third port (FP) com- 
prises four configuration registers replicating the in- 
formation stored in said code port (CP) that must be 

so used in order to write e-FPGA configurations data. 

7. A dynamically reconfigurable processing unit ac- 
cording to claim 5, wherein said third port (FP) uses 
a chip select to access in the addressable memory 

55 space and a burst enable to allow burst serial ac- 
cess. 

8. A dynamically reconfigurable processing unit ac- 
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cording to claim 1, wherein said connection be- 
tween said interface (7) and said port (FA) is pro- 
vided by a local bus (6). 

9. A dynamically reconfigurable processing unit ac- s 
cording to claim 5, wherein said Flash memory de- 
vice (4) includes four modules (16) each arranged 
in at least three programmable user-defined parti- 
tions, each one devoted to a corresponding port. 
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(g) Reconfigurable ASIC. 

(57) A configurable semi-conductor integrated cir- 
cuit with particular application as a re-con- 
figurable application specific device. In order to 
be able to rapidly switch between two or more, 
preferably several, configurations, the invention 
provides a configurable semi-conductor inte- 
grated circuit in which an area (1) thereof is 
formed with a plurality of cells (2) each having 
at least one function and interconnections with 
at least some other said cells (2). At least some 
of the plurality of cells have interconnections 
(25) which are electrically selectable as to their 
conduction state, and at least some of the 
plurality cells have interconnections (YA-YD) 
which are pre-wired. Each cell has two or more 
possible configurations, each configuration be- 
ing defined by the cell function and/or its inter- 
connection with other cells according to cell 
configuration data, and further comprising 
means (36, 38, 40) storing configuration data for 
at least two cell configurations (per cell) and 
means (30, 32, 34, 42, 48) to enable one of the 

I possible cell configurations according to the 

[ cell configuration data selected. 
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The present inventi n relates to a configurable in- 
tegrated circuit with particular emphasis on a re-con- 
figurable application specific device but without lim- 
itation to same. 

Micro-processors are designed into many appli- 
cations because of their low cost and high perfor- 
mance. However, for many applications such as im- 
age compression and digital signal processing they 
are too slow. Modifications to the basic micro-proces- 
sor architecture has led to several new devices, digi- 
tal signal processors (DSP), reduced instruction set 
computers (RISC) and custom processors (CP). Each 
f these devices are optimised to perform a restricted 
number of tasks but at very high speed. Many appli- 
cations require several types of such devices to ach- 
ieve the necessary level of performance. This is be- 
cause of the requirement to perform different types of 
computational tasks over a period of time or the lim- 
ited capability of each device. Essentially these de- 
vices are used as low cost high performance numer- 
ical engines, each optimised to implement a general 
class of algorithms. However, a designer frequently 
requires a different architecture to efficiently imple- 
ment a new algorithm and the usual practice in such 
circumstances is to design a custom processor for this 
task. This leads to long and expensive design cycles 
and does not allow the designer any flexibility to 
change the algorithm. 

Field programmable gate arrays (FPGAs) are 
commonly used to replace standard products and 
they could be used as a numerical engine. However, 
they are general purpose devices that cannot effi- 
ciently implement high speed circuits. In order to ach- 
ieve the level of complexity that is normally required, 
several FPGAs would be necessary which would in- 
crease the cost of the final system. Some FPGAs are 
configured using on chip static random access mem- 
ory (SRAM) and these devices can be re-program- 
med to perform different tasks which could lead to 
greater flexibility and higher levels of performance. 
However, these devices are connected to an external 
source of configuration data that is accessed by the 
device to configure internal resources. The time to 
configure or re-configure the FPGA can be several 
milliseconds, due to the necessity to import configur- 
ation data from an external source, and this time is 
several orders of magnitude too slow. Reconfigura- 
tion speeds of less than 100 nano-seconds are re- 
quired for high performance applications. As such 
FPGA's cannot be reconfigured fast enough to make 
them suitable for use as a high performance numeri- 
cal engine. In FPGAs a considerable amount of silicon 
area is committed to the configuration memory which 
is required to program interconnect resources. Whilst 
in theory FPGA's could accommodat an additi nal 
configuration by increasing the amount of on chip 
memory which is available to hold configuration data 
this would probably increas the size of the chip by 60 



per cent which would be prohibitive for high density 
arrays. 

The aim of the invention is to provide a re-config- 
urable architecture which can rapidly switch between 

5 two or more, preferably several, configurations. An- 
other aim of this invention is to provid a d vice that 
is specifically optimised to carry out functions for nu- 
merically intensive applications. Another aim is to pro- 
vide a device that prior to the application of power 

10 contains one or more boot up primary configurations, 
suitable for configuring the device into the intended 
application. A further aim is to produce a device that 
has provision for passing data between successive 
configurations of the (base) device. A still further aim 

is is to ensure that during configuration of the device, 
data is held in a safe condition and that switching cur- 
rents are minimised. A still further aim is to provide a 
configuration cache that will allow updating of config- 
uration memories that are not currently in use. An- 

20 other aim of the invention is to allow the device to se- 
lect its own configuration from an external source of 
configuration data. 

Another aim is to reduce the number of program- 
mable interconnections by pre-wiring a portion of the 

25 logic into the required configuration. 

A yet further aim is to increase performance of 
the device by pre-arranging specified primary func- 
tions to specific areas of the device such primary 
functions being substantially pre-wired. 

30 Accordingly one aspect of the invention provides 
a configurable semi-conductor integrated circuit in 
which an area thereof is formed with a plurality of cells 
each having at least one function and interconnec- 
tions with at least some other said cells, at least some 

35 of the plurality of cells having interconnections which 
are electrically selectable as to their conduction state, 
and at least some of the plurality of cells having inter- 
connections which are pre-wired, each ceil has two or 
more possible configurations, each configuration be- 

40 ing defined by the cell function and/or its interconnec- 
tion with other cells according to cell configuration 
data, and further comprising means storing configur- 
ation data for at least two cell configurations (per cell) 
and means to enable one of the possible cell conf ig- 

45 urations according to the cell configuration data se- 
lected. 

By pre-wired in relation to interconnect we mean 
uninterruptable as to its conduction state. The config- 
uration data controls selection of the cell function 

50 and/or cell interconnections preferably using decod- 
ers or alternatively controlled directly from memory. 
Thus for example the cells configuration data deter- 
mines the routing of the signal through the cell. Direct 
connection paths exist between the configuration 

55 stores, the d coders and th selectable functions and 
interconnections. The term function as used herein 
may be a logic function, arithmetic function, or inter- 
connect function. A cell may have one or more df 



2 



3 



EP 0 668 659 A2 



4 



thes functions r a combination of two or more of 
these. Preferably th configuration data stores are 
disposed in the cell. Th d sired configuration is se- 
lected using an Instruction bus receiving signals from 
a sequencer and controller. One or more of the con- 
figurations may be pre-wired (i . not programmable). 
Advantageously one ormore of th configuration data 
stores are programmable using a data transfer bus. 
Where more than one store is programmable an in- 
struction update bus is provided to write enable the 
required configuration store. Configuration stores not 
currently accessed to control interconnection and/or 
cell function can be updated using the instruction up- 
date bus. 

Since the present invention is particulary con- 
cerned with an application specific device which is 
optimised to perform a restricted number of tasks at 
high speed but which is quickly reconflgurable during 
program execution (when required) to perform some 
other specific task, cells are optimised for a primary 
function according to a primary configuration. Advan- 
tageously the primary configuration data is pre-wired. 
It is convenient to have two alternate pre-wired pri- 
mary configurations. Cells can be and most usually 
will be optimised for different primary functions. Ad- 
vantageously the pre-wired interconnections are 
used in connection with the optimised functions. 

A possible primary function is that of an adder. 
Another aspect of the invention provides a multi-bit 
adder for summing at least two multi-bit words com- 
prising a first multi-bit adder block for summing the 
least significant bits and at least one further multi-bit 
adder block for summing the most significant bits and 
having sum selection means wherein said further 
multi-bit adder block calculates the two possible 
sums resulting from a carry out from the previous 
block being equal to '0' and '1' respectively and 
wherein the sum selection means selects the sum of 
the further multi-bit adder block according to the carry 
out calculated from the previous block. 

In the case of a Digital Signal Processor applica- 
tion, some cells will be optimised as Arithmetic Logic 
Units (ALU) while other cells may be optimised to car- 
ry out functions such as Instruction decode or as 
processor registers. The number of different cells is 
nly limited by the size of the array of cells. In practice 
the array will be divided into a number of discrete 
areas that are particularly efficient at implementing 
respective primary functions. It will be apparent that 
each of these cells has the capability to implement 
another function and usually a range of other func- 
tions according to other configurations. These addi- 
tional functions are controlled by the controller and 
sequencer whose role is to ensure that th correct 
fundi n is available when requir d. Primary func- 
tions may use general interconnect resources, but 
pref rablyth yhaveth ir own dedicated resource for 
high speed connections between primary functions 



of other cells. In this way the performance of the d - 
vice is not dependent on a general programmable in- 
terconnect resource and by connecting primary func- 
tions through resources with smaller parasitic loads, 

5 the device can operate faster. 

In order to safeguard data when changing be- 
tween configurations each ceil has a latch controlled 
by a function control bit. Transient current is reduced 
when switching between configurations by the provi- 

io sion of a buffer in each cell, the buffer being control- 
lable as to is state during reconfiguration by a control 
line. 

It will be apparent that whilst this device has spe- 
cific application in the field of numerical engines such 
is as DSP's, the primary functions can be chosen to suit 
other applications. Accordingly the techniques can be 
applied to any application. For example, another ap- 
plication is as a programmable communications de- 
vice. 

20 Another aspect of the invention also provides a 

method of configuring a configurable semi-conductor 
integrated circuit in which a sequence is programmed 
with data to facilitate selection of a required configur- 
ation from at least two possibilities. Usually each of 

25 a plurality of cells will have at least two configuration 
possibilities. Advantageously the configurations are 
programmable and the method further comprises in- 
putting and storing configuration data. A further ad- 
vantageous feature is the ability to program the se- 

30 quences to write over previously stored configuration 
data at a prescribed point in operation of the circuit. 
An aspect of the invention provides a semi-conductor 
integrated circuit in which the circuit configuration is 
changed according to a pre-programmed sequence of 

35 configuration during operation of the device. 

The present invention will now be described by 
way of example only with reference to the accompa- 
nying drawings; in which:- 

Figure 1 is a schematic layout for re-configurable 

40 application specific device embodying the inven- 

tion; 

Figures 2 and 3 illustrate diagrammatically the 
feature of the core architecture having different 
configurations and sequential access; 

45 Figure 4 illustrates diagrammatically the feature 

of the core having cells which are opti mised to i m- 
plement specific functions; 
Figure 5 illustrates diagrammatically a primary 
configuration for the device as a Digital Signal 

so Processor (DSP); 

Figure 6 illustrates diagrammatically a second- 
ary configuration for the device as a large multi- 
plier, 

Figure 7 illustrates sch matically the layout of a 
55 cell including configuration memory means; 

Figure 8a illustrates diagrammatically the possi- 
bl arrangem nt of the cells in blocks with opti- 
mis d functi ns; 
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Figure 8b illustrates schematically programma- 
ble local and global Interconnect resources for 
th cells; 

Figures 9a and 9b illustrate diagrammatically 
how the global interconnect resources are con- 
nect d t the cell input and output multipl x rs; 
Figures 9c and 9d illustrate diagrammatically an 
array of cell blocks and the arrangement of cells 
within a cell block; 

Figure 10 illustrates diagrammatically cell output 
state control; 

Figures 11, 12 and 13 illustrate diagrammatically 
three logic cell variants namely an Arithmetic 
Logic Unit function (ALU), an Accumulator func- 
tion (ACC), and a Decode cell function respec- 
tively; 

Figure 14 illustrates diagrammatically examples 
of different functions from the ALU and ACC opti- 
mised core cells; 

Figure 15 illustrates diagrammatically details of 
configurable Static Random Access Memory pro- 
visions; 

Figure 16 illustrates diagrammatically further de- 
tails of the cell configuration memory; 
Figure 17 illustrates diagrammatically instruction 
bus connections for DSP cells; 
Figure 18 illustrates diagrammatically a novel 
parallel carry select adder architecture which can 
be conf igured by the device; 
Figure 19 illustrates a cell configured to imple- 
ment a single stage carry select adder. 
Figure 20 illustrates a cell configured to imple- 
ment two carry select adders; 
Figure 21 illustrates an alternative cell configur- 
ation to implement a single stage carry select ad- 
der; and 

Figure 22 illustrates diagrammatically a DSP 
Timing Diagram. 

The present invention is described in the context 
of an integrated circuit intended for an application 
specific device and will be described by way of exam- 
ple in the specific context of a Digital Signal Proces- 
sor (DSP). According to the invention the device is not 
restricted to a fixed architecture, but has the hard- 
ware re-configurable to allow the device (eg. DSP) to 
be optimised for each individual task. Thus at a macro 
level the device may be optimised for a new applica- 
tion for example MPEG, Polygon Engine, Blitter, DMA 
Engine, whilst at a micro level, the device can be opti- 
mised for each OPCODE, eg. MULTIPLE ALU, CUS- 
TOM MULTIPLY. Thus a re-configurable application 
specific device (eg. DSP) allows many custom devic- 
es to be replaced with a single chip. Optimised OP- 
CODES increase performance. In effect the device 
can switch at clock speed between operating as a 
DSP, RISC or custom processor. 

Referring firstly to Figure 1 , here there is illustrat- 
ed a re-configurable application specific digital signal 



processor. The chip includ s an area 1 f core cells, 
Partitioned Static Random Access Memory (SRAM), 
3, a sequencer and controller 5 having control lin s 
7, clocks 9 and clock lin s 11, as well as programma- 

s ble input/output 13 and associated data bus 15. Also 
shown is a signal Decompress d coder 17, a commu- 
nications link 19 and associated input/output and Ex- 
pansion porting 21, and address bus 23. 

There are a plurality of core cells 2 and these pro- 

10 vide for example (in the case of a DSP configuration), 
Instruction Decode, registers, programme counter 
and stack pointer facilities. Each core cell can be pro- 
grammed to perform a range of functions and certain 
core cells are optimised to implement specific func- 

is tions. Thus, for example, reference to Figure 4 illus- 
trates optimisation of certain cells for ALU functions 
as at 2a, registers 2b, programme counter 2c, general 
counter 2d, instruction decode 2e and Input/output 2f. 
One schematic configuration of core cell denoted 

20 bydotted outline is shown in Figure 7 and the core cell 
includes within it a logic cell 22 having selectable 
functions (for example four). Programmable core cell 
inputs (eight) (ie. electrically selectable interconnec- 
tions) are shown at 25 applied to two 4:1 input multi- 

25 plexers 26,28. The cell output is shown at 27. Exam- 
ples of Logic cell configurations are described further 
with reference to Figures 11, 12,1 3, and 14. Input mul- 
tiplexers are controlled by respective 2-4 Decoders 
30, 32. A further 2-4 Decoder 34, controls a 4-1 Mul- 

30 tiplexer in the logic cell 22 and an output multiplexer 
70 is controlled by a 2-4 Decoder 48. Direct pre-wired 
connections to the logic cell are indicated by numeral 
YA-YD. 

In the Figure 7 illustration the cell includes con- 

35 figurable memory provisions comprising configura- 
tion cache 36 and instruction cache 38, as well as so 
called "hard wired" or fixed configuration provisions 
40. For the DSP application the fixed configurations 
comprise a primary DSP Boot Configuration set by 3 

40 x 2 bit configuration elements 40a, and a secondary 
configuration eg. Multiplier configuration set by 3 x 2 
bit configuration elements 40b. It is intended that the 
primary (fixed) configuration will be implemented au- 
tomatically on boot-up of the device so as to give it its 

45 primary application specific function. 

The configuration cache 36 in the illustrated em- 
bodiment comprises four, 3 x 2 bit data stores, 36a-d 
which can be write enabled from an instruction up- 
date bus 44 and written with data from Data bus 46. 

so The instruction cache 38 comprises 8 x 2 bit data 
stores which are write enabled from the Instruction 
update bus (44) and written with data from the data 
bus 46. The instruction cache 38 is read enabled from 
the Instruction select bus 42. A 2-4 Decoder 48 en- 

55 abled from the instruction select bus 42 selects and 
read enables one of the four data stores 36a-d ac- 
cording to the data store of the instruction cache se- 
lected. The utput of Decoder 48 also facilitates the 
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direct configuration of th logic cell by controlling the 
4:1 output multiplexer 70. Also illustrated is afunction 
control bit 50 and has connections from the read and 
write nable lines (42,44) and into the logic cell 22. 
The function control bit 50 controls latch 54 (see Fig- 
ure 10). 

Figure 16 illustrates, for the fixed configuration 
provisions (40) and the configuration cache 36, the 
read (42), write 44" and data 46' connections. Note 
both read and write provisions for the configuration 
cache 36 only. 

Reverting back to Figures 2 and 3. each of blocks 
2', 2" and 2"' represent configurations of the core 2. 
Large blocks of functionality are accessed as a series 
of configurations. Each new configuration receives 
data from the last using inter-process connections 52 
and cells 54 designated for latching critical data. 
Other cells 54 are designated to act as inputs or out- 
puts. Reconfiguration time can be of the order of 
10nsec. The core architecture is optimised to imple- 
ment each OPCODE. This allows the word size of 
ach arithmetic function to be adjusted to the required 
provision. Thus, referring to Figure 3, a first core con- 
figuration (OPCODE 1 ) executes a 1 6 bit multiply and 
cos function, a second core configuration (OPCODE 
2) carries out a 32 x 32 bit multiply function, and a 
third configuration (OPCODE 3) carries out a 64 bit 
ADD function. 

Reference is now made to Figure 10 which illus- 
trates the output state control as applicable to the ii ke 
of the cell illustrated in Figure 7 and the correspond- 
ing cell components appropriately referenced are illu- 
strated with the exception of the instruction cache 38. 

As has been mentioned above certain cells are 
designated for latching critical data and hence the 
cells have a latch provision 54 with inputs from the 
function control bit 50 and a hold input line 56. These 
function to preserve the state of data from cells be- 
tween configurations. In addition a buffer 60 is pro- 
vided in order to reduce transient current when 
switching between configurations by setting its out- 
put state to a known condition. 

The cells interconnect resources are now descri- 
bed with reference to Figures 8a, 8b, 9a, and 9b. Fig- 
ures 8a and 8b show diagrammatically how cells 
might be arranged in regular blocks (B) (eg. rows and 
columns), with the blocks including cells which are 
optimised for different functions. Thus Figure 8b 
shows columns of ACC cells, ALU cells and shift cells, 
and two rows of Decode cells. Columns of cells each 

have two global (Y) buses (Y1, Y2, Y3, Y4 YN1. 

YN) and the rows of cells each have at least two glo- 
bal (X) buses (X1, X2 Xn-I.Xn^TheD cod cells 

head up the columns of each block and have three X 
buses. Bus switches BS are provided in the Y buses 
between adjacent blocks. In addition there are hidden 
(or pre-wired direct connecti n) Y buses, YA-YD. 
These run from the decode cells to all the cells in the 



column below. In addition local direct connection 
paths are preferred between cells. Thus, taking as an 
example cell SC in Figure 8b, it has input connections 
from outputs of an upper adjacent cell, a lower adja- 
5 cent cell, a right adjacent cell, a left adjacent cell, and 
an xt left adjacent c II. Th s connections ar d s- 
ignated U, D, R, L, J. Not all cell variations will nec- 
essarily have all the local connections. The majority 
of these local connections are electrically selectable 
10 as to their conduction state, but most usually the left 
adjacent connection will be a pre-wired connection. 

Figure 9a illustrates, for one cell as for all core 
cells, how an input multiplexer 26 controls selection 
of inputs from X and Y buses and an output multiplex- 
is er 70 controls selection of outputs to the same X bus- 
es and next column of Y buses. 

The cells are arranged in 1 0 x 8 blocks and an ex- 
ample of such an array of cell blocks is illustrated in 
Figure 9c. Blocks 100 are formed in an 8 x 4 array and 
20 a programmable input/output 102, data buses and 
switches 104 and partitioned SRAM 106 are also 
shown. Each block 100 comprises an array of 10 x 8 
cells and conveniently, columns of cells within the 
block have a similar primary configuration. For exam- 
25 pie, Figure 9d illustrates a block 1 00 having two col- 
umns of cells 100 a & b configured as multiplexer 
cells, columns 100 c as a product adder, 100d barrel 
shifter cells, 1 00 earthimetic and logic cells, 1 00 f ac- 
cumulator cells and columns 100 g & h configured as 
30 multiplier expansion cells. The columns in each block 
are headed up by decode cells. 

Referring now to Figure 15, the configurable sta- 
tic random access memory (SRAM) 3 stores partition 
data passed to it from the sequencer and controllers 
35 along partition data bus 72. The operation of the DSP 
requires the storing and retrieving of data and the pro- 
vision of the SRAM on the device ensures that access 
to the stored data is faster than if the SRAM was lo- 
cated externally. 
AO The sequencer and controller 5 controls the op- 
eration of buses 42, 44, 45 and 46. Hence, the se- 
quencer and controller 5 includes the control of the 
operation of selecting individual data stores of cells, 
sending data to the stores and controlling the se- 
45 quence of implementation of configuration data stor- 
ed within cell. The necessary control instructions for 
the sequencer and controller 5 is provided by an ex- 
ternal source of memory (not shown). In addition to 
the above operations, the controller 5 can select in- 
50 dividual data stores not currently used such that they 
can be updated with new configurations from the ex- 
ternal memory. 

Figur s 11, 12 and 13 illustrate respective ALU, 
ACC and Decode cell variants. Appropriate referenc- 
55 es have been used as previously ref rred t . 

Figure 13 shows an example of a c II optimised 
for decode. Tw dec d c Us will h ad up th blocks 
of cells as shown In Figures 8a and 8b. The illustrated 
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variati n is the on which has the pre-wired int rcon- 
nection YA, YB which f ed down to each of the cells 
below. The other decod will generat the YC, YD 
pre-wired interconnections. Thus th ALU type cells 
of Figure 11 have pre-wired connections YA, YB, 
whilst the ACCtyp cells hav pre-wired connections 
YA. YB. YC. YD. Note also that for the ALU and ACC 
variants the left adjacent connection L is pre-wired, 
and for the ALU cell the Cin, Cout is a pre-wired inter- 
connection running the length of the column of cells. 
Other X and Y buses are as described above. 

Control signals from the outputs of the decode 
and for inputs of the cell variants will be pre-wired for 
the optimised cell functions, ie. for any functions 
which are known to be needed for the specific appli- 
cation. 

Figure 14- illustrates some of the different func- 
tions which are available from the ACC and ALU core 
c lis of Figures 10 and 11 respectively. 

Figure 17 illustrates an alternative internal cell 
arrangement for the case of DSP cells (shown simpli- 
fi d) with the cell input shown simply at 25 and cell 
output at 27. The memory comprises 8 x 3 bit data 
stores and a 3-8 Decoder 80 is provided such that one 
of the eight selectable options (eg. functions or inter- 
connect) contained in the logic cell can be selected. 
In order to update a particular data store within a par- 
ticular cell there is provided a memory select 45 
(omitted from the illustrations of the previously de- 
scribed cell arrangement) and hence the required cell 
can be selected and the particular data store to be 
write enabled or read enabled is selected by the in- 
struction update bus (44) or instruction bus (42). Data 
is written to the data store from memory data bus (46) 
(not illustrated in Figure 17). 

A novel adder structure which can be configured 
by the device will now be described with reference to 
Figures 18 to 21 . A 16-bit adder is illustrated in Figure 
18 and indicated generally by numeral 60. The adder 
comprises a plurality of carry select adders 62 form- 
ing a first multi-bit adder block 64 and a second multi- 
bit adder block 66. The adder 60 sums two 16 bit 

words indicated as a1, a2, a3 a16 and b1, b2, 

b3 b1 6 In order to derive a sum indicated by s1 , s2, 

s3 s16 and carry element 'Couf . 

First multi-bit adder block 64 sums the eight least 
significant bits of each 16 bit word and for each bit 
there is an associated carry select adder 62. Each 
carry select adder comprises two inputs An, Bn 
(wherein 'n' is the number of the bit), output 68, carry 
in 70, carry out 72 and a first and second 2:1 multi- 
plexer 74, 76. The first input to the first multiplexer 74 
is equal to the value of An + Bn assuming the carry 
in is '0' and th second input assum s that carry in to 
be T. The output Sn is selected by the carry in 70. 

The two inputs to the second multiplexer 76 are 
equal to the carry resulting from the sum of An and 
Bn with the carry in being equal to "0" and T. The car- 



ry ut72iss I cted by carry in 70. Obviously, the car- 
ry in to the first carry select adder will be equal to '0'. 

The second multi-bit adder block 66 sums th 
eight most significant bits of each 1 6 bit word and for 

5 each bit there are two associated carry sel ect adders, 
78, 80. Each of th carry select adders 78, 80 is con- 
structed in a similar manner as described above. Car- 
ry select adders 78 sum the two eight bit words ie. a g , 
a,0 a^ and b 9 , b,0 b,6, assuming that the carry 

10 out from the first adder block 64 is '1 ' and carry select 
adders 80 assume that the carry out is '0*. Therefore, 
for each bit two outputs are calculated and fed into an 
associated multiplexer 82. The output providing Sn is 
selected by the carry out from the first adder block 64. 

is In operation, the first adder block calculates the 
addition of the eight least significant bits and produc- 
es a carry out value. Simultaneously, the second ad- 
der block calculates the two possible sums of the ad- 
dition of the most significant bits and the correct sum 

20 is selected by the carry out produced by adder block 
64. In consequence the time delay to calculate a 16 
bit addition is taken to be the delay in the addition of 
the first eight bits (8ADD) plus the delay in selecting 
the sum of the last eight bit je^ one multiplexer delay 

25 (MUX). 

For each additional eight bit adder block the time 
delay is equal to one multiplexer. For example, a thirty 
two bit adder would result in a propagation delay of 
8ADD + 3 X MUX In consequence, the adder struc- 

30 ture described results in an improved speed of oper- 
ation compared to that of a conventional adder struc- 
ture. 

Figure 20 illustrates an alternative cell structure 
wherein the two carry select adder requiring two cells 
35 can be replaced by a single configured cell. 

Figure 21 illustrates a conventional circuit for sin- 
gle stage carry select adder which may be used as an 
alternative to the circuit of Figure 19. 

The operation of the device will now be described 
40 wherein initially, as described above, the configura- 
tion provisions 40 are 'hard wired' or fixed with a DSP 
configuration 40a and a multipier configuration 40b. 

An external memory store (not shown) contains 
all the necessary configuration data in order to con- 
45 trol the controllerand sequencersuch that each of the 
data stores (36a-d, 38) in each cell can be program- 
med. In order to program a data store a typical proce- 
dure would be to firstly select the cell by memory se- 
lect 45, select the data store to be write enabled by in- 
50 struction update bus 44 and to write data to the se- 
lected store via data bus 46. 

Each of the four data stores of the configuration 
cache 36 contains sufficient configuration data to se- 
lect the input to the logic cell 22 and to also select one 
55 of the functions contained within the logic cell. 

The initial boot up operation of the device results 
in a configuration as per eith rofth primary config- 
urations 40a, 40b according to the boot up instruc- 
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tion. Thus for example the DSP or Multiplier configur- 
ation is established. 

However, if the device is required to implem nt 
another configuration eg. a divide function, then the 
controller and sequencer 5 selects and write enables s 
the required data store of th configuration cache 36 
of each cell necessary to implement the configura- 
tion. The external memory supplies the necessary 
data as to which cell and data stores are to be select- 
ed in order to implement the required configuration. 10 

There is also the option for adopting other pro- 
grammed configurations from the configuration 
cache and for writing and substituting other conf igur- 
ations. 

Thus for the example given, the four conf igura- is 
tions possible from the configuration cache may not 
be sufficient Software programming can be used to 
implement another configuration. The programmer 
will be able to refer to the technical specifications for 
the device and determine how the desired func- 20 
tion/configuration can be implemented (for example 
many possible architecture changes will be listed, 
perhaps in terms of a load instruction). Thus whilst 
load instructions 1 -4 might represent the most typical 
configurations which are to be stored in the conf igur- 25 
ation cache, the programmer determines from the 
technical specif ication that load instruction 33 for ex- 
ample is required. Thus the programmer will have the 
instruction loaded into the configuration cache. There 
will be instances where more configurations are re- 30 
quired to process the incoming data then can be stor- 
d in the cell memory for access at clock speed. How- 
ever, this difficulty can be overcome by re-program- 
ming a "redundant" configuration cache with the "ad- 
ditional" configuration data in advance of its require- 35 
ment, by including the re-configuration instruction in 
the software programme. The sequencer can control 
re-configuration at clock speed, whilst the data from 
the configuration is held safe in the latch cells. The 
four configurations (36a-36d) of the cache can be re- 40 
used in different combinations at different cell sites. 
This is facilitated by instruction cache (38) which can 
select different local cell configurations from a global 
instruction placed on instruction bus 42. 



Claims 

1. A configurable semi-conductor integrated circuit 
in which an area (1) thereof is formed with a plur- 
ality of cells (2) each having at least one function 
and inter-connections with at least some other 
said cells (2), characterised in that at least som 
of the plurality of cells (2) have interconnections 
(25) which are electrically selectable as t their 
conduction state, and at least some of the plural- 
ity of cells (2) have interconn ctions (YA-YD) 
which ar pre-wired, each cell has two or mor 



possible configurations, each configuration be- 
ing defined by the cell function and/or its inter- 
connection with oth r cells according to cell con- 
figuration data, and further comprising means 
(36, 38, 40) storing configuration data for at least 
two cell configurations (per cell) and means (30, 
32, 34, 42, 48) to enable one of the possible cell 
configurations according to the cell configuration 
data selected. 

2. A configurable semi-conductor integrated circuit 
as claimed in claim 1 in which means (36. 38, 40) 
storing at least two cell configurations are pres- 
ent in the cell. 

3. An integrated circuit as claimed in claim 1 or 2 in 
which the means for selecting the required cell 
configuration comprises an instruction bus (42) 
communicating with the said configuration data 

4. An integrated circuit as claimed in claim 1 , 2 or 3 
in which at least one of the cell configurations is 
pre-wired (40a, 40b) to configure the integrated 
circuit with an application specific function when 
selected. 

5. An integrated circuit as claimed in claim 4 in 
which there are two pre-wired (40a, 40b) applica- 
tion specific functions. 

6. An integrated circuit as claimed in any one of the 
preceding claims in which there is at least one 
programmable cell configuration. 

7. An integrated circuit as claimed in any one of the 
preceding claims in which there are both pre-wi- 
red and programmable cell configurations. 

8. An integrated circuit as claimed in claim 6 or 7 fur- 
ther comprising a write enable bus (44), and a 
data bus (46) communicating with the means (36, 
38) storing the cell configuration data for the pur- 
pose of rewriting data to the store for re-program- 
ming purposes. 



9. An integrated circuit as claimed in any one of the 
preceding claims further comprising means stor- 
ing a plurality of configuration selection instruc- 

50 tions, an instructions select bus (42) communi- 
cating with said means and an output signal path 
for selecting the required configuration data store 
to b implemented or directly effecting cell con- 
figuration. 
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10. An integrated circuit as claimed in claim 9 further 
comprising an instruction write bus (44) and an 
instruction data bus (46) for writing to the instruc- 
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tion storing means (36, 38). 

11. An integrat d circuit as claimed in anyone of the 
preceding claims In which means (54) is provided 
to preserve the output betwe n configurations. 

12. An integrated circuity as claimed in claim 11 in 
which said means comprises a latch (54) wherein 
each cell incorporates a latch to preserve its out- 
put 

13. An integrated circuit as claimed in any one of the 
preceding claims in which the cells are optimised 
for a primary function. 

14. An integrated circuit as claimed in claim 13 com- 
prising cells which are optimised for different pri- 
mary functions. 

15. An integrated circuit as claimed in any one of the 
preceding claims including means (60) to reduce 
transient current when switching between config- 
urations. 

16. An integrated circuit as claimed in claim 15 in 
which said means comprises a controllable buffer 
(60) in the output line of each cell. 

1 7. An integrated circuit as claimed in any one of the 
preceding claims further comprising sequencer 
means (5) to control the availability and selection 
of the configuration. 

18. An integrated circuit as claimed in any one of the 
preceding claims comprising decode means (30, 
32, 34, 48) in each cell (2) to decode configura- 
tion state to control the configuration of each cell. 

19. An integrated circuit as claimed in any one of 
claims 4 or 5, 13 or 14 in which the configuration 
data store corresponding to the primary or appli- 
cation specific function of the cell is contained 
within the device in a non-volatile memory. 

20. An integrated circuit as claimed in any one of 
claims 13 or 14 in which the pre-wired (hidden) in- 
terconnect resources interconnect optimised 
cells for efficient implementation of the primary 
(application specific) functions. 

21. A multi-bit adder for summing at least two multi- 
bit words comprising a first multi-bit adder block 
(64) for summing the least significant bits and at 
least one further multi-bit adder block (66) for 
summing the most significant bits and having 
sum selection means wherein said furth r multi- 
bit adder block calculates the two possible sums 
resulting from a carry outf rom the pr vious block 



being equal to 'O' and '1' respectively and 
wherein the sum selection means selects the 
sum of the furth r multi-bit adder block according 
to the carry out calculated from the previous 
5 block. 

22. A method of configuring a configurable semi- 
conductor integrated circuit having a plurality of 
cells (2) with at least two configuration possibili- 

10 ties in which a sequencer (5) is programmed with 
data to facilitate selection of the required cell con- 
figuration. 

23. Amethod as claimed in claim 22 further compris- 
is ing inputting and storing cell configuration data. 

24. A method as claimed in claim 22 or 23 further 
comprising programming the sequencer with 
data to write over previously stored configuration 

20 data at a prescribed point in operation of the cir- 
cuit 

25. A configurable semi-conductor integrated circuit 
characterised in that circuit configuration is 

25 changed according to a pre-programmed se- 
quence of configurations during operation of the 
device. 

26. An integrated circuit as claimed in claim 25 in 
30 which an area there is formed with a plurality of 

cells, each cell having two or more possible con- 
figurations, each configuration being defined by 
the cell function and for its interconnection with 
other cells according to configuration data. 

35 
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